static website
The biggest and last website to move over to new hardware was THOK.ORG itself. Bits of this website go back decades, to a slightly overclocked 486DX/25 on a DSL line - while static websites have some significant modern advantages, the classic roots are in "not actually having real hardware to run one". That said, it does have a lot of sentimental value, and a lot of personal memory - mainly personal project notes, for things like "palm pilot apps" or "what even is this new blogging thing" - so I do care about keeping it running, but at the same time am a little nervous about it.
(Spoiler warning: as of this posting, the conversion is complete and mostly uneventful, and I've made updates to the new site - this is just notes on some of the conversion process.)
Why is a static site complicated?
"static site" can mean a lot of things, but the basic one is that the web server itself only delivers files over http/https and doesn't do anything dynamic to actually deliver the content.1 This has security benefits (you don't have privilege boundaries if there are no privileges) and run-time complexity benefits (for one example, you're only using the most well-tested paths through the server code) but it also has testing and reliability benefits - if you haven't changed anything in the content, you can reasonably expect that the server isn't going to do anything different with it, so if it worked before, it works now.
This also means that you will likely have a "build" step where you
take the easiest-to-edit form and turn it into deliverable HTML.
Great for testing - you can render locally, browse locally, and then
push the result to the live site - but it does mean that you want some
kind of local tooling, even if it's just the equivalent of
find | xargs pandoc
and a stylesheet.
For THOK.ORG, I cared very little about style and primarily wanted to
put up words (and code snippets) -
Markdown was the obvious
choice, but it hadn't been invented yet! I was already in the habit of
writing up project notes using a hotkey that dropped a username and
datestamp marker in a file, and then various "rich text" conventions
from 1990's email (nothing more than italic, bold, and code) - I
wasn't even thinking of them as markup, just as conventions that
people recognized in email without further rendering. So while the
earliest versions of the site were just HTML, later ones were a little
code to take "project log" files and expand them into blog-like
entries. All very local, README
→ README.html
and that was it.
Eventually I wrote a converter that turned the project logs into
"proper" markdown - not a perfect one (while using a renderer helped
bring my conventions in line with what rendered ok, I never managed to
really formalize it and some stuff was just poorly rendered), just one
that was good enough that I could clean up the markdown by hand and go
all in on it. There was a "side trip" of using
Tumblr as a convenient mobile
blogging service - phone browsers were just good enough that I could
write articles in markdown on a phone with a folding bluetooth
keyboard at pycon.ca
and get stuff online directly - I didn't
actually stick with this and eventually converted them back do local
markdown blogs (and then still didn't update them.)
Finally (2014 or so) I came up with a common unifying tool to drag
bits of content together and do all of the processing for the content
I'd produced over the years. thoksync
included a dependency
declaration system that allowed parallelized processing, and various
performance hacks that have been overtaken by Moore's Law in the last
decade. The main thing is that it was fast enough to run in a git
post-update
hook so when I pushed changes to markdown files, they'd
get directly turned into live site updates. Since I was focussed on
other things in the meantime (including a new startup in 2015) and the
code worked I hadn't really touched it in the last decade... so it
was still python 2 code.
Python 2 to Python 3 conversion
Having done a big chunk of work (including a lot of review, guidance, and debugging) on a python 3 conversion of a commercial code base, I was both familiar with the process and had not expected to ever need to touch it again - the product conversion itself was far later than was in any way reasonable, and most other companies would have been forced to convert sooner. It was a bit of a surprise to discover another 2000+ lines of python 2 code that was My Problem!
While there were only a few small CLI-tool tests in the code (which I was nonetheless glad to have) I did have the advantage of a "perfect" test suite - the entire thok.org site. All I had to do was make sure that the rendering from the python 3 code matched the output from the python 2 code - 80,000 lines of HTML that should be the same should be easy to review, right?
This theory worked out reasonably well at first - any time the partially converted code crashed, well, that was obviously something that needed fixing.
Port the entire thok build from python2 to python3
(tested by diffing the built site with leap - found some corrupted images/large binaries)
* #! update
* print → print(), file= (in comments too)
* Popen(text=True)
* except/as
* drop (object)
* lost feedbalidator so minivalidate.py doesn't actually work yet
* file → open
* open → with open as
* rfc822 → email.utils (parsedate, formatdate)
* argument "tuple unpacking" is gone
* SimpleHTTPServer, BaseHTTPServer → http.server
* isinstance(basestring) → isinstance(str) (just to reverse-detect etree fragments)
* markdown.inlinepatterns.Pattern → InlineProcessor (old API exists but it made more sense to debug the new one)
* etree no longer in markdown.util
* grouping no longer mangled, so group(1) is correct
* different return interface
* add → register
* string hack for WikiLinkExtension arguments no longer works
* lxml.xml.tostring → encoding="unicode" in a few places to json-serialize sanely
* in a few places, keep it bytes but open("w" → "wb") instead
* thokrss: dependency tracking → tracker (was *never* right, just untested __main__ code)
* python 2 allowed sorting functions by id; python 3 doesn't, so just extract the names in key=
* tumblr2thoksync: long → int
* transformer.py: remove a bunch of unused imports
... get to the python rendering code ... point to staticsite ... mention nagaina
-
This definition of static doesn't preclude things with client-side javascript - I've seen one form of static site where the server delivered markdown files directly to the client and the javascript rendered them there, which is almost clever but requires some visible mess in the files, so I've never been that tempted; it would also mean implementing my own markdown extensions in javascript instead of python, and... no. ↩
Got far enough into staticsite
that it was time to go beyond the
basic blog, and the ice cream blog turns
out to be a good testbed for that.
Fix the images
Images (specifically, jpg
files from cameras or modern cellphones)
are, by default, large and messy, despite staticsite
doing clever
things with img.srcset
. It turns out that there's a stack of
problems:
- ImageMagick
convert
doesn't update (or discard)EXIF.width
andEXIF.height
when resizing, and later parts of the toolchain (probably including the browser itself) get mislead by small images with large dimensions. - Certain parts of the
staticsite
markdown processing path end up giving absolute instead of relative links to the produced images (still looking for where though) and so if you make a local sandbox copy of the main site, some of theimg
files that the browser fetches actually come from the upstream live site instead of the sandbox, completely confusing your debugging process. - I really want the images to use bootstrap's
img-fluid
which I can add using the markdown "attributes" extension, which is already turned on, but I want it consistently site wide.
On top of that, it may turn out that the part of the problem I care
about needs to be fixed in the python-markdown
layer instead of
staticsite
itself, but it may just be "non-overridable python code"1
rather than something I even can fix in a theme.
Current solutions:
- github ticket #70
filed to describe the
<img>
problem and hostname part. - Use the
python-markdown
attribute extension{: class="img-fluid"}
manually on all images, so that they scale-to-fit regardless of what processing they've been through. - Wrote a little
icecream-start shop-name
that takeskpa-grep
output and fills in a blank markdown file with a title and filled in![]()
image includes for each image (so I can write the article and just delete the unneeded images as I go along - which will work better once#70
is fixed, for now half of the images go upstream instead of locally.) - Bigger hammer:
icecream-start
now usesjhead -autorot -purejpg
2 which just rotates them losslessly and wipes out any conflicting EXIF metadata. This, combined withimg-fluid
and a width-clamp insite.css
were the minimal "image-heavy pages are actually good now" set of changes.
Finish taxonomy support
staticsite
has Hugo-style taxonomies (to the point of linking to
them for documentation.) It does a fine job building index pages, but
stops there. The two followons to make them useful are
- Link those index pages in the navbar (or the sidebar, but for
photo-heavy mobile use I find that the sidebar is an utter failure,
so my first template effort was to turn that off and use full
width ("12 column" in
bootstrap
terms) - The default page templates include the tags at the bottom, but only if they're from the tags taxonomy. Turns out we can just iterate over the available taxonomies and render all of them.
Current solutions:
- navbar config is one line in the
index.md
metadata, done. - replacing the "tags for this article" with "all tags for all taxonomies for this article" was some simple nested loops in Jinja2, once I got past the scoping problem below.
A future possibility is to add some markup (possibly subverting the
wikilinks
syntax, or maybe just using links with a magic urltype) that lets me
just use the tags in-line in the text without having to put them in
the per-post metadata. (Future, not blocking for now, and ideally it
would just be a hook into the same taxonomy
plumbing.)
The template changes ran into some issues:
- Jinja2 macros are file scoped, so an attempt to replace a single
macro (like
inline_page
as called bypages
) is silently ignored, instead you need to replace the entire file including the otherwise unchanged calling macro (at which point you might consider giving up on extending the existing theme in the first place.) - Some of the
ssite
subcommands will parse a.staticsite.py
orsettings.py
in the top level of the site source, which would let you configure a theme; important ones likessite show
ignore that entirely and require a--theme
argument. - For a while this looked like "syntactically bad themes (or settings) were silently not imported"; that turns out not to be true, it just wasn't importing them at all because the config was ignored instead.
- The existing settings aren't actually in-scope in the settings file, though you may be able to import the global settings it's not clear that those are the correct ones after other processing.
- Some of the data structures visible in the template act like strings
but aren't strings - so for example, you can iterate over the
taxonomies, and if you render that inline you get the names, but you
can't then get the taxonomy from there because you end up
attempting to use the object as a key and not the name. On top of
that, python code in jinja2 templates has very limited access to
python builtins - so you don't have
dir
orstr
(though you can simulate the latter with"" ~ var
, it's not great.) Turns out that most of these objects have a.name
you can use directly, but I haven't found good documentation for that - but at this point, I recognize it as a pattern, so "just try.name
" is part of my experimentation repertoire.
System dark mode
blag
had what turns out to be really simple bits of CSS3 for a dark
mode that turns on when the browser is in dark mode (usually triggered
by "system" darkmode, through xsettings
and GTK themes.) It's worth
adding that to the staticsite
theme if we can do it in a simple way.
Current solutions:
- Within the theme directory,
static/css/*.css
get installed, so just copy the defaultsite.css
there and add extra files that it explicitly@include
's. - Specifically,
@import "bootstrap-color-fix.css" screen and (prefers-color-scheme: dark);
isolates all of the horror - so providing a color mode is only one mechanical line of CSS. - To create that file, just copy
/usr/share/javascript/bootstrap4/css/bootstrap.css
(include attribution comments, it is MIT licensed) and delete everything that isn't a color, which gets it down to about 700 entries; then cook up a little elisp to "invert" a color string in the buffer. Yes, this is gruesomely brute force - but it's short term: bootstrap 5.3 has proper dark-mode support built in, so whenstaticsite
upgrades (not something I'm prepared to tackle myself right now1) we can just discard these changes and use that support instead. (I don't actually want any in-page controls for this, just automatic support for the viewer's system or in-browser choices.)
More markdown extensions
It's a little messy to even turn on extensions; the documentation
(doc/reference/pages/markdown.md
) says you can set
MARKDOWN_EXTENSIONS
but it doesn't actually say where and see the
problem above about things ignoring settings.py
.
Aside from
wikilinks
for in-line taxonomy reference,
I'd like to turn on whatever makes bare URLs into links;
SO
suggests just using <>
which I'd forgotten, but also gives both a
(mildly flawed) sample extension for it and a pointer to
markdown2
which has
link-patterns
as a mechanism for this.
Geography
Saw Simon Willison's experiments with OpenFreeMap and MapLibre and realized it would be really easy to lay out my Ice Cream Journey on it. Not sure it's worth actually hosting an entire tileset (when by definition I only need Massachusetts), and later on I might just stash maps at various static zoom levels or something simple like that. For now, though, it's responsive and doesn't need an API key, and the Javascript interface is straightforward.
In fact, my use of the interface is probably too straightforward -
rather than being generated from page metadata, there's just a
hard-coded list of Names, markdown page names, and lat/long pairs, and
two dozen lines of code to forEach
the place list and create a
maplibregl.Marker
attached to a maplibregl.Popup
for each; through
the glory of Unicode, we can even have 🍨 markers for general ice
cream and 🍦 for places that specialize in soft-serve. That all works
fine, the only manual step is adding a single line of data to the
map.html
file for every review I do - technically moving it into
per-page metadata wouldn't be less work, or more robust in any way,
but it feels like the right place for it, so I'll get to that
eventually.
Since this is still an experiment, I didn't want to just have "Map" in
the navbar, I wanted a specific experimental marker in the title. The
definition of the navbar is just a list in the metadata of index.md
itself, but the titles are expected to be in the metadata of each of
those pages - the main trick here is that raw html files aren't,
they're actually J2Page
Jinja2 templates, so you can stuff a {%
block front_matter %}
inside an HTML comment, and that works as a
clean way to hide the metadata.4
Page Width
One final issue (and one of the only design aspects I've gotten feedback about from readers5) is that on a wide screen, the pictures are too huge and the text ends up ridiculously wide. It took decades but the web design industry did realize that the newspaper industry's use of narrow columns was good for reading,6 but Bootstrap itself doesn't appear to have any useful defaults for this (or even any good stackoverflow answers.) All it needs is
@media (min-width: 40em) {
.container-fluid {
width: 40em;
}
}
(adjust 40em
to taste, but probably keep it in character-width
units to stay consistent with other user preference choices.) All
this declares is that if the screen is 40em
wide or larger, set
the outermost bootstrap container width to 40em
; this keeps smaller
size layouts unchanged, and breaks smoothly as you get larger.
-
It's open source python code, everything is overridable, but for me it's a big step towards just writing a new engine (or adding these features to one of my old ones) which I'm specifically shying away from in this moment. ↩↩
-
github:Matthias-Wandel/jhead, yes, that Matthias Wandel of youtube woodworking fame. ↩
-
See blag
style.css
for theprefers-color-scheme
conditionals in@media
stanzas; a mere 8 lines for each scheme. ↩ -
Both of them! Dark mode, on the other hand, was entirely implemented for me personally, and worth the effort to get working when I was still looking at the site in draft, regardless of anyone else ever seeing it. ↩
-
Even though it had very little to do with that and was more of an artifact of how to assemble type in frames for printing, up through linotype and phototypesetting in column inches that were literally pasted up. ↩
My earlier attempts to distill blogging (and blog creation) down from a software and sysadmin task to "just name something and start writing" have kind of failed, but as I'm shuffling around hardware and feeling inspired to procrastinate by writing, I'm doing another pass.
Given that I'm python-oriented, I wanted something primarily in python, open source, with extra points for "maintained in Debian" and "I haven't failed to use it previously."
Blag
blag
is maintained by a Debian
developer, easy to get launched, is named after an XKCD
comic, and I actually put 3 draft blogs
together with it in a couple of days before trying the next thing.
(In particular, I had one site that was going to mostly be collected
essays and with some blog bits, and not primarily a blog, though I
still wanted an index and RSS and tagging, I had some trouble
reorganizing that one into the right shape.)
Definitely still worth a look, especially for anything "actually blog shaped" - I had filled half a whiteboard with notes on what I actually wanted before I stumbled on the next candidate, so it was very helpful in getting me to define what I meant by "static site blogging" and how that was different from what I thought I meant. Unlike many of the other systems discussed here, the developer actually notices github issues, which is commendable.
Staticsite
staticsite
caught my eye
in an odd sort of way - it's still a markdown blog with other
features, an instant-blog tutorial (doc/tutorial/blog.md
), and some
obvious tooling. What stood out was that it had Hugo-inspired taxonomy support -
when tags aren't enough but you want kinds of tags, this lets you
name and label a group, and have automatic lists of pages in the navbar,
just by using them (and creating one two-line file.) This was
attractive, especially for my ice cream
blog which is itself completely serious
but also serves as a playground for tooling and rendering ideas; ice
cream shops have flavors, towns, and novelties and I can just drop a
little metadata on each page.
(2024-08-07 side note: still fixing some details like actually including those on the pages themselves like tags are1, doing user defined themes2 at all, and fixing the image handling3; I'm not stuck on any of those, just merely-part-way into them.)
(2024-08-21 side note: fixed the above and I'm using it live - see staticsite-itself for more in-depth usage and customization.)
Others
Others I've glanced at - didn't really dismiss, they just didn't end
up on the fast-path before I got to staticsite
:
Pelican
pelican
is in Debian, and the initial
description starts with metadata in a post; this wasn't originally
an objectionable issue, but after using blag
and staticsite
I find
I really want a minimal post to need no more than a # title
(though
I certainly want to be able to add metadata later, that's "being
organizational", not "blogging", and is minor unexpected friction.) Is
this excessive? Certainly, but I'm also someone that recommends that
developers learn to touch-type (and pick an editor) early in their
careers - I'm already committed to being excessive about flow and friction.
Nikola
Nikola python, markdown, MathJax; also heavy
on the required metadata (and seems to require a new_post
command.
ssite new
is similar but optional, and is really just a generic "run
a template for me" tool.) Looks very featureful, I was just in the
mood for something with less rope.
Hyde
Hyde is named as a pun on Jekyll (a popular github-pages-capable ruby static site tool) - not in Debian, is on pypi but last release was 9 years ago, the description page has many dead links, and doesn't yet have a completed python3 port.
other sources
- https://wiki.python.org/moin/StaticSiteGenerator (I'd forgotten for
a moment that
moinmoin
itself is not static, I used to use it for a homedir-only wiki though.) staticjinja is on this list and is even more minimal/"raw templating" thanstaticsite
; not so much as a recommendation but a point on the curve describing the shape of these things. - https://www.reddit.com/r/Python/comments/rja4l2/what_is_the_best_python_static_site_generator/ turns up high in google but the only relevant bits are Sphinx, Lektor (newish) and mkdocs-material.
- https://jamstack.org/generators/ looks exhaustive, to the point of
including a number of long-dead examples among the currently 54
listed (and
staticsite
itself isn't on the python list.) The backing repo has a pile of untouched pull requests, so it's likely to stay out of date.
-
turns out that
ssite show
ignores.staticsite.py
so you can't set an explicit path to a theme, but it takes a--theme
argument; misleadingly,ssite shell
does read the settings. There are probably 2 or 3 issues here, I'm just not sure which ones are real (the "show ignores settings" bit might just be an under-documented security concern) and haven't filed them yet. ↩ -
recently figured out that ImageMagic
convert -resize
produces a smaller JPEG, but doesn't update the EXIF Data which definitely misleads the browser, and is probably also misleadingssite
when it generates the smaller images (since it also doesn't discard the EXIF data.) Again, still needs a couple of experiments where I do clean up and let it re-run before deciding which parts are actually issues. (In the end, I stomped on the native size-handling with bootstrap'simg-fluid
.) ↩